Goto

Collaborating Authors

 fair clustering


Fair Clustering Under a Bounded Cost

Neural Information Processing Systems

Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness.


Fair Clustering Through Fairlets

Neural Information Processing Systems

We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the \emph{price of fairness} by quantifying the value of fair clustering on real-world datasets with sensitive attributes.


Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

Chakraborty, Diptarka, Chatterjee, Kushagra, Das, Debarati, Nguyen, Tien-Long

arXiv.org Artificial Intelligence

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].


Fair Clustering via Alignment

Kim, Kunwoong, Lee, Jihu, Park, Sangchul, Kim, Yongdai

arXiv.org Machine Learning

Algorithmic fairness in clustering aims to balance the proportions of instances assigned to each cluster with respect to a given sensitive attribute. While recently developed fair clustering algorithms optimize clustering objectives under specific fairness constraints, their inherent complexity or approximation often results in suboptimal clustering utility or numerical instability in practice. To resolve these limitations, we propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function. The proposed algorithm, called Fair Clustering via Alignment (FCA), operates by alternately (i) finding a joint probability distribution to align the data from different protected groups, and (ii) optimizing cluster centers in the aligned space. A key advantage of FCA is that it theoretically guarantees approximately optimal clustering utility for any given fairness level without complex constraints, thereby enabling high-utility fair clustering in practice. Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability.


Fair Clustering with Clusterlets

Setzu, Mattia, Guidotti, Riccardo

arXiv.org Artificial Intelligence

Given their widespread usage in the real world, the fairness of clustering methods has become of major interest. Theoretical results on fair clustering show that fairness enjoys transitivity: given a set of small and fair clusters, a trivial centroid-based clustering algorithm yields a fair clustering. Unfortunately, discovering a suitable starting clustering can be computationally expensive, rather complex or arbitrary. In this paper, we propose a set of simple \emph{clusterlet}-based fuzzy clustering algorithms that match single-class clusters, optimizing fair clustering. Matching leverages clusterlet distance, optimizing for classic clustering objectives, while also regularizing for fairness. Empirical results show that simple matching strategies are able to achieve high fairness, and that appropriate parameter tuning allows to achieve high cohesion and low overlap.


FACROC: a fairness measure for FAir Clustering through ROC curves

Quy, Tai Le, Thanh, Long Le, Hong, Lan Luong Thi, Hopfgartner, Frank

arXiv.org Artificial Intelligence

Fair clustering has attracted remarkable attention from the research community. Many fairness measures for clustering have been proposed; however, they do not take into account the clustering quality w.r.t. the values of the protected attribute. In this paper, we introduce a new visual-based fairness measure for fair clustering through ROC curves, namely FACROC. This fairness measure employs AUCC as a measure of clustering quality and then computes the difference in the corresponding ROC curves for each value of the protected attribute. Experimental results on several popular datasets for fairness-aware machine learning and well-known (fair) clustering models show that FACROC is a beneficial method for visually evaluating the fairness of clustering models.


Fair Clustering Under a Bounded Cost

Neural Information Processing Systems

Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. The relative increase in the cost, the ''price of fairness,'' can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.


Reviews: Fair Clustering Through Fairlets

Neural Information Processing Systems

While there is a growing body of work on fair supervised learning, here the authors present a first exploration of fair unsupervised learning by considering fair clustering. Each data point is labeled either red or blue, then an optimal k-clustering is sought which respects a given fairness criterion specified by a minimum threshold level of balance in each cluster. Balance lies in [0,1] with higher values corresponding to more equal numbers of red and blue points in every cluster. This is an interesting and natural problem formulation - though a more explicit example of exactly how this might be useful in practice would be helpful. For both k-center and k-median versions of the problem, it is neatly shown that fair clustering may be reduced to first finding a good'fairlet' decomposition and then solving the usual clustering problem on the centers of each fairlet.


Fair Clustering: Critique, Caveats, and Future Directions

Dickerson, John, Esmaeili, Seyed A., Morgenstern, Jamie, Zhang, Claire Jie

arXiv.org Artificial Intelligence

Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering.


Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric

Zeng, Pengxin, Li, Yunfan, Hu, Peng, Peng, Dezhong, Lv, Jiancheng, Peng, Xi

arXiv.org Artificial Intelligence

Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by developing a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}.